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Abstract 

Stochasticity and spatial heterogeneity are of great interest recently in 
studying the spread of an infectious disease. The presented method solves 
an inverse problem to discover the effectively decisive topology of a het- 
erogeneous network and reveal the transmission parameters which govern 
the stochastic spreads over the network from a dataset on an infectious 
disease outbreak in the early growth phase. Populations in a combination 
of epidemiological compartment models and a meta-population network 
model are described by stochastic differential equations. Probability den- 
sity functions are derived from the equations and used for the maximal 
likelihood estimation of the topology and parameters. The method is 
tested with computationally synthesized datasets and the WHO dataset 
on SARS outbreak. 



1 Introduction 

When the epidemiologists at a public health agency detect a signal of an in- 
fectious disease outbreak, they rely heavily on mathematical models of disease 
transmission in estimating the rate of transmission, predicting the direction 
and speed of the spread, and figuring out an effective measure to contain the 
outbreak. Many of the models formulate stochasticity and spatial heterogene- 
ity, which are of great interest recently. The spatial heterogeneity ranges from 
the uneven probabilities of contacts between the individuals in communities 
[Walker 2010] , [Small 2006) , dependence of the strength of the demographical in- 
teractions between cities on the distance [Keeling 2004 , to nation-wide or world- 



wide inhomogeneous geographical structures |Dangerfield 200"9] , |Riley 2 007 . 

A Monte-Carlo stochastic simulation is widely used to understand the influ- 
ence of the spatial heterogeneity on stochastic spreading. In such a simulation, 
accuracy and reproducibility of the input demographical knowledge such as the 
amount of traffic between cities have great impacts on the reliability of the out- 
put pattern of the movement of pathogens and their hosts. But, in studying 
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world-wide epidemics, just a collection of regular airline routes and aircraft ca- 
pacities does not always present the transportation network which results in the 
real chain of transmission. Some routes are influential decisively, but the others 
are not. Examples are found in the spread of Severe Acute Respiratory Syndrome 
(SARS) from Asia to the world in 2003. There were not any cases in Japan in 
spite of the heavy traffic there from Asian countries. Many patients appeared 
in Canada earlier than the United States to which airlines connect Asian coun- 
tries much more densely. Here arises an interesting question. Inversely, is it 
possible to learn the effectively decisive transportation network by observing how 
the disease spreads, reinforce the demographical knowledge on the network, and 
import the acquired knowledge into the mathematical model? This is an inverse 
problem similar to the network tomography [Macno 2009 , [Rabbat 2008 . 

In this study, a statistical method is presented to discover the effectively 
decisive topology of a heterogeneous network and reveal the parameters which 
govern stochastic transmission from a dataset on the early growth phase of 
the outbreak. The dataset consists of either the number of infectious per- 
sons or the number of new cases per an observation interval. The method 
is founded on a mathematical model for a stochastic reaction-diffusion pro- 
cess [Baronchcl li 2008) . The population in the model is described by a set of 
Langevin equations. The equations are stochastic differential equations which 
include rapidly fluctuating and highly irregular functions of time. Probability 
density functions and likelihood functions are derived from the equations ana- 
lytically, and used for the maximal likelihood estimation of the topology and 
parameters. The method is tested with a number of computationally synthe- 
sized datasets and the World Health Organization (WHO) dataset on the SARS 
outbreak in March through April in 2003. 

2 Problem 

2.1 Stochastic model 

The model in this study is a special case of a stochastic reaction-diffusion pro- 
cess. The model is a combination of standard epidemiological SIR or SIS com- 
partment models and a meta-population network model. The meta-population 
network model [Colizza 2007 sub-divides the entire population into distinct sub- 
populations in N geographical regions. Movement of persons occurs between 
the sub-populations while the epidemiological state transitions (infection and 
recovery) occur in a sub-population. A sub-population is randomly well-mixed. 
Heterogeneity is present between sub-populations. 

The geographical regions are represented by nodes [i = 0, 1, • • • , N — 
1). The movement is parameterized by a matrix 7 whose i-th row and j-th 
column element jij is the probability at which a person moves from m to rij 
per a unit time. A person remains at the same node at the probability of 1 — 
y^J^ 1 7a ■ Generally, jij — ^ji does not hold. By definition, 7^ = 0. It 
is often confirmed empirically that a simple law relates a network topology to 
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the movement 'Barrat 20041- The topology is specified by a neighbor matrix I. 
The transportation between two regions is represented by a pair of unidirectional 
links. If a pair of links is present between m and rij, Uj = Iji = 1. If absent, 
lij = Iji = 0. By definition, la — 0. In the experiments in section [^J an 
empirically confirmed law jij = Tij(l) is postulated, and the topology and the 
probability of movement are treated interchangeably. 

The SIR compartment model Keeling 2008 is a behavioral extreme where 



immunity is life- long. The state of a person changes from a susceptible state (S), 
through an infectious state (I), to a recovered state (R). In contrast, the im- 
munity does not occur in the SIS compartment model. The state of a recovered 
patient goes back to S. The parameter a represents the probability at which an 
infectious person contacts a person and infect the person per a unit time. If the 
contacted person is susceptible, the number of the infectious persons increases 
by 1. The effective rate of infection by a single infectious person is the product 
of a and the proportion of the susceptible persons within the population. The 
parameter ft represents the probability at which an infectious person recovers 
per a unit time. These parameters are constants over subpopulations and time. 
The basic reproductive ratio r is defined by r — a/ ft [Lipsitch 2003 



Movement, infection and recovery are Markovian stochastic processes gov- 
erned by jij, a, and ft. 

2.2 Time evolution of spread 

In a stochastic process, even if the initial condition is known, there are many 
possible trajectories which the process might go along. A set of these pos- 
sible trajectories is a statistical ensemble. The change in the population is 



described by a set of Langevin equations |Hufnagel 2004 . A Langevin equation 



is a stochastic differential equation [Kloeden 1992 . The microscopic continuous 



time evolution of a system is obtained by adding a fluctuation (a stochastic 
term) to the known macroscopic time evolution of the system. 

The quantity Si (t) is the number of susceptible persons at a node n,; at time 
t. h(t) is the number of infectious persons. Ri(t) is the number of recovered per- 
sons. The change in Ii{t) (i — 0, 1, • • • , N — 1) is given byeq.® [Col izza 2006] . 
It is a set of N stochastic differential equations. 



d/ *W aSi(t)Ii(t) a T f*\j.\^„ TM ST^ TM 

\ t \ f \ / 3=0 3=0 

jV-1 N-l 
J=0 j=0 

Stochastic terms £(t) = (^(t),^ 1 ^),^ 1 ^)) are rapidly fluctuating and 
highly irregular functions of time. The number of terms is M = N' 2 + N (N 
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terms for infection, N terms for recovery, and N(N — 1) terms for movement) . 
The functional forms of individual elements f ffl (i) (a = 0, 1, • • • , M — 1) are not 
known. Their statistical property is the Gaussian white noise which satisfies 
eg.® through 

(?a (t)) ensemble = 0. (2) 
(fa (*)&>(«)) ensemble = S ab S(u - t) . (3) 
(fa(i)6(")fc(w) ' ' ' )enscmble = 0. (4) 

In these equations, 8(t) is a Dirac's delta function, and 5 a b is a Kronecker's 
delta symbol. The ensemble average of a variable x is (a;)cnscmbic- Eq.Q means 
that there is no correlation at different times and between different terms. Eq.Q 
means that the third and higher order moments vanish. 

In most cases, the outbreak is contained before the spread reaches equilib- 
rium. In the early growth phase of the outbreak, li <C Si and R4 <C Si hold 
true. The first term of the rightside of eq.([T]) is independent of Si and Ri be- 
cause Si/(Si + Ii + Ri) sa 1. The resulting equation is eq.flSJ). Eq.© can also 
be applied to the SIS model. 



df<(t) 
dt 



N-l N-l 

= ai t (t) - pii(t) + E -rMt) - E 

N-l N-l 

+ E ^Mm^Kt) - E yJyiMQ&V)- (s) 

The cumulative number of new cases until time t is represented by J; (t) (i = 
0, 1, • • • , N — 1). The rate of increase in Ji(t) equals to the first term of eq.flSJ). 
That is, ali(t). The time evolution of Ji(t) is given by eq.®. The rightside 
dose not depend on «/,(£) itself. 

^ =aJ < (<) + ^W)^ ] (t). (6) 

The total number of the infectious persons at time t is given by I(t) = 
Y^Lq 1 Ii(t)- Its time evolution is given by eq.([7]). It does not depend on the 
values of 7^. 

, T( .s N-l N-l 

^ = al(t) - pi(t) + E V^W)t ] (t) E yfPW)^®- (7) 

i=0 i=0 

The total cumulative number of new cases until time t is given by J(t) = 
Y^i^o 1 Ji(t). Its time evolution is given by eq.©. 

^ = al(t) +E V^m&Ht)- (8) 

i=0 
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2.3 Definition of problem 

The problem is to discover the network topology I (or 7) and reveal the trans- 
mission parameter r (or a and /3) from a given dataset Ii{t c i) (i = 0, 1, • • • , N — 
1, d = 0, I,--- ,£>- 1) or AJi(i d ) (» = 0, 1, • • • , N - 1, d = 0, 1, ■••,£)— 1). 
The dataset Ii(td) is the time sequence of the number of infectious persons. 
The dataset AJi(td) = Ji(td+i) — Ji(td) is the time sequence of the num- 
ber of new cases between observations. Observation is made at every node 
rii [i — 0, 1, • ■ ■ , N — 1) at times td (d = 0, 1, • • • , D — 1). The time interval 
between observations is At = td+i — td- For example, a bundle of the daily 
reports on cases from hospitals is a dataset AJj(^) where At = 1 day. Other 
information is not known. That is, nothing is known about Si(td), Ri(td), nor 
the initial condition which could identify the index case (the first patient from 
whom the infectious disease has spread). 

3 Method 

3.1 Likelihood function 

Various techniques of statistical inference can be applied once the likelihood 
function is obtained analytically. The likelihood function is the conditional prob- 
ability of the obtained dataset as a function of the unknown parameters of a 
parameterized statistical model. The conditional probability becomes noticeably 
large if the value of the parameters is close to the true value. For example, 
maximal a posteriori estimation is used to find the parameters which maximize 
the posterior distribution. In this study, the problem is solved by maximal 
likelihood estimation. The Langevin equations ([5]) through §E§ are solved by 
obtaining the moments of probability variables at time t so that the proba- 
bility density functions and logarithmic likelihood functions can be derived, 
rather than by calculating the trajectories of time-dependent variables for a 
given functional form of £ a (t) [Dangcrfie ld 2009] . Four logarithmic likelihood 
functions £ [I11 (0), LP 2 1(0), L^ n \e), and L lJ2| (0) are derived for given datasets 
h(td), I{td), AJi(td), and AJ(td) respectively under the unknown parameters 
e = {j,a,l3}. 

Appendix [XI presents the procedure to solve the Langevin equations through 
a Fokker-Planck equation |Kampen 2007] . Appendix [B] summarizes the formula 
for the time evolution of mW(t|0), m' J l(i|<?) (row vectors whose z-th element is 
the mean of Jj, Jj) and v^(t\0), v^(t\0), v^^(t\0) (matrices whose i-th row 
and j-th column clement is the covariance between Ii and Ij , Ii and Jj , Ji and 
Jj). Appendix [Cl summarizes the formula for the time evolution of mfl(t\6), 
m^(t\8) (the mean of/, J) and f [II1 (i|0), v [u] {t\8), v [ " ] {t\0) (the variance of 
/, covariance between / and J, variance of J). 
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3.1.1 Case 1: for given Ii(td) 

The logarithmic likelihood function L\ ll \G) is determined by Ii(t d )- If the 
third and higher order moments are ignored, the probability density function 
p(I ,td + i\0) is a multi-variate Gaussian distribution with the mean m^(td+i\9) 
andcovariance«P I l(t ( i+i|^) ineq.©. This is the probability of 7 = (Iq, ■ • ■ ,I n -i) 
at t = td+i given 8. X T is a transpose of a matrix X. 

[n]f m exp(-l(7-mmfa +1 |0))^"]fa +1 |0)- 1 (/-mP]fa + i|g)) r ) () 

P ' VWdetvW(t d+1 \8) 

L^(6) is the logarithm of a product of the probability of the individual 
observation I(td+i) at t = t d +i- It is given by eq. ([T0]) . 



D-2 

lW(o) = iog P M(i(t d+1 ),t d+1 \e). (io) 

If At is small, the formula for the moments become simpler. The exact 
formula for m\ l \t\0) and v^(t\8) are expanded in terms of At, and the second 
and higher order terms are ignored. If the data Ii(t d ) at t = t d is reliable 
completely, mf\td\8) = h{td) and vf^(t d \8) = 0. The moments after At are 
given by eq. tjTTj) and (TT2"1) . The moments at t = t d +i depend only on the data 
at t = td- 

N-1 N-1 

mf ] (t d +i\0) « Ii(t d ) + {(a - /3 - ^ JikWd) + nMU)W- (H) 

k=0 3=0 



N-1 N-1 

v?l ] (t d+ i\0) w [{(a + 13 + lik)Ii{U) + JkMtdiySij 

k=0 k=0 

- Hjl&d) ~ 7jilj(td)]&t- (12) 

Similarly, the logarithmic likelihood function L\ l2 \0) is determined by I(t d )- 
I(td) can be calculated from the given dataset Ii(t d ). The probability den- 
sity function p^(I,t d+ i\8) = p^(I,t d+ i\ a, j5) is a Gaussian distribution with 
the mean mfl(t d +i\8) and variance (td+i \8). It does not depend on 7. 
£l I2 l (0) — £l I2 I (a, /3) is the logarithm of a product of the probability of individ- 
ual observation I(t d +i) at t = t d +i- It is given by eq. (fT"3")) . 

D-2 

L^(a,0) = £ \ogp^(I(t d+1 ),t d+1 \a,(3). (13) 

Again, if At is small, the formula for the moments become simpler. They 
are given by eq. (fT4"|) and (IT51) . 

mM (t d+ i |0) « !(*„) + (a - p)I(t d )At. (14) 
uM(t d+1 |0)«(a + /W<i)At. (15) 
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3.1.2 Case 2: for given AJi{t d ) 

The logarithmic likelihood function L^ ,n ^(0) is determined by AJ^t^). The 
probability density function p^ ,n \j ,t ( i+i\0) is a multi-variate Gaussian dis- 
tribution in the same functional form as eq.© with the mean mi J l {td+i \9) 
and covariance v^^(td+i\0). The functional form of LP 1 ' is also the same as 
cq. (fT0|) . Ji(td+i) can be calculated from the dataset by Ji(td+i) = Ji(io) + 
^2d'=o AJi(td')- The first observation AJi(to) may include all the known cases 
at that time. Then, J, (to) can be deleted from the above formula. There is, 
however, a big difference from L^(9). The probability at t = td+i is not de- 
termined by the data Ji(td) because the moments of Ji at t = td+i depends on 
Ii at t = td whose value is not known. Such approximation as eq. (|14j) or (|15| 
is not correct. Thus, the exact formula for v^ 3 \t\6) as a function of t must be 
evaluated to calculate the value of L^ n ^(0). 

Similarly, the logarithmic likelihood function (0) is determined by J(t<j)- 
J(td) can be calculated from the given dataset AJi(t d )- The probability density 
function p[ J2 l(J, td+i\0) = pi' 72 ' (J, td+i\a, 0) is a Gaussian distribution with the 
mean m' J ' (td+i \9) and variance v^ J \td+i\0)- The functional form of L\ 32 \9) — 
I/[ J2 l(a,/3) is the same as ea. (fT5)) . The approximation in eq. lfH)) and (JT5J) is not 
correct either. Thus, the exact formula for v^^(t\0) as a function of t must be 
evaluated to calculate the value of L\ }2 \a, 0). 

3.2 Estimation procedure 

Theoretically, every formula in the following can be applied in estimating jij 
directly, as well as estimating I and obtaining 7 by the law 7^ = T^il). But, 
the estimation of N(N — l)/2 binary parameters iy (i < j) tends to be more 
robust than that of N(N — 1) continuous parameters 7y (i 7^ j). The binary 
parameters are suitable for reliable combinatorial optimization by means of well- 
established numerical algorithms and computational implementations. Thus, the 
estimation of I is detailed here and demonstrated in section [^} 

3.2.1 Case 1: for given Ii(td) 

The procedure for the estimation from Ii(td) is prsented. The problem is solved 
by dividing it to two sub-problems and solving them sequentially, rather than 
by searching the maximal likelihood estimators d, 0, and I simultaneously The 
first sub- problem is to obtain a and by solving eq. (|16l) . 

a, = argmaxL [I2] (a,/3). (16) 

a, /3 

The estimators are given by eq.([T7|) and (fT5|) where AI(td) = I(td+i) — I(td)- 
A= 1 rj yA/fa) 1 (Et^A/fa)) 2 E^AJfa) 
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The second sub-problem is to obtain the maximal likelihood estimator I 
using the obtained values of a and p. They are obtained by solving eq.(fT9|). 

I = axgmaxL [I1] (&, p,T i:j (l)). (19) 

Eq.(fP9|) can not be solved analytically. There are 10 14 possible topologies 
for N = 10, and 10 57 for N = 20. Simulated annealing [Press 2 007 is a power- 
ful meta-heuristic algorithm to solve such a combinatorial global optimization 
problem. A candidate of parameters l' is generated randomly near the present 
value of I. The parameters are updated from 7^ — Tij(l) to Tij(l ) according 
to the probability p(s) in eq. ([20| in the s-th step (s = 0, 1, • • • ) of iterations. 

p(s) = mm(exp( J - ), 1). (20) 

T(s) is the annealing temperature in the s-th step. Typical cooling schedule 
is T(s) = l/log(s + 1). Since 0(T) = 1, the scaling constant k is selected as an 
appropriate value whose order is the same as that of L^l 



3.2.2 Case 2: for given AJi(t d ) 

The procedure for the estimation from AJj(t^) is presented. Again, the problem 
is divided to two sub-problems. The first sub-problem is to solve eq. (|2Tj) . The 
quantity 1(0) is the initial value of the number of infectious persons, which 
appears in the formula for the mean and variance of J in Appendix [C] It is not 
the same as the known J(to), but an unknown parameter. 

a, j3, i(0)=arg max L [n] (a, p, 1(0)). (21) 

a, p, 1(0) 

Simulated annealing uses the probability p(s) in eq. ([2^|) for the update of a 
candidate. An alternative means to solve eq. ((2~T|) is such a function maximization 
algorithm as a BFGS quasi-Newton method Press 2007 . 

M • ( , LW(a,p,I(0))-LW(a>,p>,I(0y) ^ 
p(s) = mm(exp( — ), 1). (22) 

A great difficulty in maximizing L^ }1 \6) is encountered in solving the sec- 
ond sub-problem. The very complex formula for i>[ JJ l(£|0) to obtain the value 
of L[ J1 ](0) is not tractable even numerically unless N is very small. An approx- 
imation is introduced to convert this problem to the computationally tractable 
second sub-problem in 13.2.11 The valued of Ii(td) is approximately obtained 
from the value of AJi(td) by eq. (|2"3"|) . which use the already obtained value of 
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a. Eq.(fT9"|) is solved with the converted values of Ii(td) instead of maximizing 
£[ J1 ](0) directly. 



m) « (23) 

Eq. (|23|) is a discrete time approximation of eq.(j6|) for small At. This rela- 
tionship holds true for the mean values of Ii(td) and AJj(i<j). But the variance 
oiIi(td) is overestimated by neglecting the stochastic term \foiT i ^° i \ Because of 
the approximation, the estimation from AJi(td) would be more erroneous than 
that from Ii(td). The estimation errors are demonstrated in section 2J 



4 Experiment 

4.1 Computationally synthesized dataset 

A number of test datasets are synthesized by numerical integration [Kloeden 1992 
of a Langevin equation ([T]) for random network topologies and transmission pa- 
rameters. The network is a Erdos-Renyi model in a combination of N and 
the average nodal degree (ki). The nodal degree of a node rii is given by 
ki — X)^) 1 hj- The probability at which Uj = 1 is (ki) /(N — 1). 

It is postulated that the total number of persons who moves from n, to rij 
per a unit time is proportional to y^kjtj if a link is present. This law is known 
valid generally for the world-wide airline transportation network |Barrat 2004] . 
It is also postulated that the initial population Pi(0) = Si(0) + li(0) + Ri(0) 
of a node rij is proportional to the total number of persons who outgoes from 
the node per a unit time. Consequently, 7^- is determined as a function of I by 
cq. (|2"4"|) . The fraction of persons who outgoes per a unit time is a constant 7 
over the network. This is an additional unknown parameter in solving cq. (|19[) . 
The law in eq.(f24"f is used in discovering the network topology by eq. (fT9|) as well 
as synthesizing the datasets computationally. 



lij \ jkikj 

^2j=o lij^/kik 



Tfr = r«(i)= "7 7 . (24) 



.Pi(O) is given by eq. (|25|) . The total population is set to P = 10 6 iV in the 
experiment. 



m - _£^^— p. (25) 



The estimation error of the basic reproductive ratio is defined by eq. (|26|) . It 
is a relative absolute deviation from the true value. 

Sr= M = M. (26) 

r a/p 
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The estimation error of the topology is defined by cq. (|2"7|) . It is the fraction 
of links whose presence or absence is estimated wrongly. 

„ _ S i<3 \hj ~ hj\ 

El ~ N(N -l)/2 ' (2?) 

Figure [7] illustrates an example of the network topology estimated from a 
computationally synthesized dataset. The graph [A] shows the dataset Ii(t d ) 
with At = 1 and D = 100 when r = 2. The drawing [B] shows the topology 
with N = 10 and (ki) = 3 to synthesize the dataset. The index cases appear 
at hq. The network includes a core sub-structure consisting ofnQ, ri2, n^, n§, 
and ng . It is nearly a clique where every node is connected to every other node. 
Links are present except for the one between n$ and ng. The drawing [C] shows 
the topology estimated from the dataset. The error is E\ = 0.18. The core is 
discovered correctly. The ability of the method is surprising in distinguishing the 
only pair of nodes where the link is absent. The links from the core to n\ and n>j 
are discovered. Although the method identifies that m, ne, and ng do not belong 
to the core, but form the stubs ( dead ends ) from the core, it fails to estimate 
how they are connected to each other and the core. The number of cases is the 
smallest at these nodes. The movements of infectious persons to and from them 
are so infrequent that the analysis on them is not so reliable. 

The estimation error Ei of the method in this study is compared with those 
of a naive estimation and a mere random guess. The naive estimation relies 
on the correlation between nodes. When an infectious person moves from m 
to rij, Ii decreases and Ij increases by one simultaneously. Thus, intuitively, 
the negative correlation of the change in Ii and Ij is the signal of the presence 
of a link. The correlation pij between rij and rij is calculated by eq. i28\) where 
M i (t d ) = I i (t d+1 )-I i (t d ). 



D-2 N-l 1 N-l 

= (Mi(td) - ^ E U k (t d ))(Mj(U) - ^ E A/ fc(M)- (28) 

d=0 k=0 k=0 



Note that^2 k AI^T^it^/N is not the average over the time sequence for , 
but the average over the nodes at t = t d . This formula is supposed to exclude 
the positive correlation because of the common growing trends in Ii ( dependent 
on (a — f3)IiAt in eg. Ul\) ). The naive estimation predicts lij = 1 if pij < 0. 
The random guess is the worst bound of estimation. The number of links whose 
presence or absence is predicted wrongly obeys a binomial distribution. The 
mean and standard deviation of Ei are 0.5 and 0.0745 for N = 10, and 0.5 and 
0.0256 for N = 20 theoretically. 

Figure&shows E\ for various values of the normalized average degree (ki)/(N- 
1) ((ki) = 2,3,4 for the number of nodes N = 10) when r = 2, and Ii(t d ) with 
At = 1 and D = 100 is given as a dataset. For small (ki), the naive estimation 
does not work at all. As (ki) increases, Ei of the method increases and that 
of the naive estimation decreases. For large (ki) , the naive estimation becomes 
less erroneous than the random guess. But, it never surpasses the method in 
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Figure 1: Example of the network topology estimated from a computationally 
synthesized dataset. [A]: datasct Iiitj) with At = 1 and D = 100 when the 
basic reproductive ratio is r = 2 (a = 0.067, /3 = 0.033,7 = 0-1)- Individual 
curves represent the nodes. [B]: random network topology with N = 10 and 
(ki) = 3 to synthesize the dataset in [A]. The index cases appear at uq. At 
t = igg, h > I a > lo > h > h > h > h > h > h > h- [C]: network topology 
estimated from the dataset in [A]. 
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Figure 2: Estimation error Ei for various values of the normalized average 
degree (ki)/(N — 1) ((hi) = 2,3,4 for the number of nodes TV = 10) when 
r = 2 [a = 0.067, f3 = 0.033,7 = 0.1), and h(t d ) with At = 1 and D = 100 
is given as a dataset. The initial condition is Iq(Q) = 200 and li(0) = for all 
i 7^ 0. The individual plots show the mean and standard deviation over trials for 
100 different random networks, [a]: maximal likelihood estimation (the method 
presented in this study), [b]: naive negative correlation estimation, [c]: mere 
random guess (theoretically 0.5 ± 0.0745). 

this study. The initially heterogeneous node-to-node distribution of infectious 
persons relaxes more quickly in the networks having more links. For example, 
the standard deviation of ij(tgg) is about 400 for (ki) = 2, and about 300 for 
(fcj) = 4, while the mean for both (ki) is about 550 (tz> Iq(0) exp((a—j3)DAt)/N). 
The growing trends also become more homogeneous. Under such homogene- 
ity, the negative correlation implies the movements between nodes directly. The 
naive estimation may be a substitute if assuming homogeneous distribution dur- 
ing observation are well-grounded. 

Figure [3] shows the estimation errors Ei and E r for various values of the 
normalized average degree (ki)/(N — 1) ((ki) = 2,3,4 for N = 10 and (ki) — 
3, 6, 9 for TV = 20), N, and r when £(t d ) with At = 1 and D = 100 is given as 
a dataset. The findings are as follows. 

• As (ki) increases, E\ increases from around 0.2 to 0.4. Although Ei (the 
average ± standard deviation) remains less than 0.5 within the range of 
the experimental conditions here, the estimation comes close to a mere 
random guess (Ei — > 0.5) for the dense network limit (— > a complete 
graph) . Discerning the presence or absence of links becomes more difficult 
as the spread goes on over more links in parallel and reaches more nodes 
along more possible routes. On the other hand, E r does not change largely 
as (ki) changes. 
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• As the network becomes larger, E\ increases and E r decreases. As the 
model becomes more complex (the number of links 0(N 2 ) becomes larger 
compared to the amount of data O(ND)), there may appear more optimal 
or sub-optimal topologies. Choosing a unique right answer becomes more 
difficult from such similar candidates. On the other hand, the central limit 
theorem guarantees that the fluctuation decreases as the network becomes 
larger because a and /3 are estimated from the sum of N probability 
variables (I(t d ) = X^o* ■*»(*<«))• 

• As r increases, Ei increases (but the difference between r = 4 and r — 6 
is very small) and E r increases from around 0.1 to 0.35. The observations 
can not track down the rapid reproduction of patients when rAt is large. 

The dependence of the errors on 7 in eq. is investigated. The errors 
increase from Ei = 0.2 and E r = 0.092 for 7 = 0.1 in [a] of figure^ [A] to E\ = 
0.26 and E r = 0.10 for 7 = 0.2, and E t = 0.32 and E r = 0.099 for 7 = 0.4. The 
accuracy of estimation is limited when many persons move between nodes in both 
directions because of large 7. The dependence of the errors on At is investigated. 
The errors increase from Ei = 0.2 and E r = 0.092 in [a] of figure^ [A] to E\ — 
0.31 and E r — 0.18 if the observations are made four times less frequently (At = 
4, D = 25). But they are improved only slightly to Ei = 0.18 and E r = 0.074 
if the observations are made 4 times more frequently (At = 0.25, D = 400/ 
Small time interval between observations is relevant to accurate estimation. The 
errors are investigated for various initial population distributions Pi (0) . If the 
population is a thousandth (P = 10 3 N), E x = 0.23 and E r = 0.091 for N = 10, 
(ki) = 2, and r = 2 when Iq(0) = 20. If Pj(0) oc ^2^=0* kj{hkj) 4 rather than 
^jLo 1 hjy/kikj in eg. h25\) . Ei = 0.25 and E r — 0.09 when the population is 
P = 10 6 A. In this case, the population ranges in vastly diverse scales. The 
ratio of the population of the most populated node to that of the least populated 
node is P max (0)/P mi n(0) « 2000 while J P max (0)/P mi „(0) « 7 in case of ea.(Z5\). 
E r is not affected by the distribution. E\ increases when much less populated 
nodes are present. But Ei still remains small. 

Figure|4]shows the estimation errors Ei and E r for various values of (ki) / (N— 
1), N, and r when AJj(trf) with At — 1 and D = 100 is given as a dataset. The 
experimental conditions are the same as those for Figure [H The findings are as 
follows. 

• The dependency of E\ and E r on (ki), N, and r in figure U is similar to 
those in figure [3] 

• The absolute value of errors tends to increase. For example, E\ = 0.31 in 
figureUis much larger than E\ = 0.2 in figure [3] under the same experimen- 
tal conditions (ki) =2, N = 10, and r = 2. In contrast, the increase in E r 
is relatively small. E r is nearly the same when r = 6. The deterioration 
in estimating the topology, therefore, seem to result from the influence of 
the approximation in eq. (f23|) . 
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As a summary of the experiments, the estimation is particularly reliable 
(Ei ~ 0.2 and E r ~ 0.1) when ij(td) for a slow reproduction over a sparse 
network is used as an input. Such performance can not be achieved by the 
naive estimation. The estimated topology from AJj(id) is more erroneous than 
that from Ii(td) by as much as 50%. 



4.2 SARS dataset 

SARS is a respiratory disease in humans caused by the SARS corona-virus. 
The epidemic of SARS appears to have started in Guangdong Province of south 
China in November 2002. SARS spread from the Guangdong Province to Hong 
Kong in early 2003, and eventually nearly 40 countries around the world by July. 
WHO archives the cumulative number of reported probable cases of SARS3 The 
dataset in the archive had been updated nearly every day since March 17. It is a 
time sequence dataset Ji(td) with At = 1 day. In this study, the target geograph- 
ical regions are those where five or more cases had been reported in a month 
since March 17. They include Canada (CAN), France (FRA), United Kingdom 
(GBR), Germany (GER), Hong Kong (HKG), Malaysia (MAS), Taiwan (ROC), 
Singapore (SIN), Thailand (THA), United States (USA), and Vietnam (VIE). 
Mainland China is not included because no data is available in some periods 
and no data outside of Guangdong Province is reported in other periods. 

Figure \S\ shows the date when the first patient appeared and the propagating 
wavefront of the spread. It is almost certain that neither FRA nor MAS are the 
origin of the outbreak. But, nobody can tell the chain of transmission among 
CAN, HKG, SIN, GBR, ROC, and USA in just two days from March 17 to 
19 reliably. The wavefront is not as informative as anticipated. Such a naive 
gleaning verifies the obvious series of events at best. 

The estimated transmission parameters are a — 0.18 and f3 — 0.13. The ba- 
sic reproductive ratio is f = 1.4. According to the field-based medical case stud- 
ies, the basic reproductive ratio (except for super-spreading events |Fujie 2007] ) 
was r = 2.7 in February and went down to r — 1 in late March in Hong Kong 



Riley 2003 , and r = 7 in February and r = 1 in early March in Singapore 



Lipsit ch 2003| . The decrease of r is due to the quarantine, hospitalization and 



public awareness starting to take effects after WHO issued a world-wide alert 
on March 12. But the spread of SARS was still going on world-wide. The value 
slightly greater than 1 seems reasonable at that stage. 

It is postulated that the law in ea. ([24|) holds true in analyzing the SARS 
dataset. The topology which achieves the largest value of the logarithmic like- 
hood function among many trials is chosen. This is efficient in rejecting the 
local maximum to which simulated annealing may converge. The trials use dif- 
ferent random number sequences to generate nearby candidates I in eq. (|20[) . 
Figure E] shows the estimated topologies I. The topology [A] is the most likely 
(the largest value of the likelihood L = —9985). The best 30 trials out of 300 



1 World Health Organization, Cumulative number of reported probable cases of SARS, 
http: / /www. who. int/csr/sars/country /en/index. html (2003). 
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Figure 3: Estimation errors E\ and EV for various values of the normalized 
average degree (h)/(N - 1) ((hi) = 2,3,4 for N = 10 and (fc,) = 3,6,9 for 
N = 20), the number of nodes N, and the basic reproductive ratio r when 
Ii(td) with At = 1 and D = 100 is given as a dataset. The initial condition is 
/o(0) = 200 and /j(0) = for all i ^ 0. The individual plots show the mean 
and standard deviation over trials for 100 different random networks. [A]: Ei 
for r = 2 (a = 0.067, /3 = 0.033, 7 = 0.1). [B]: E t for r = 4 (a = 0.08, /3 = 0.02, 
7 = 0.1). [C]: for r = 6 (a = 0.086, /3 = 0.014, 7 = 0.1). [D]: E r for r = 2 
(the same as [A]). [E]: E r for r = 4 (the same as [B]). [F]: E r for r = 6 (the 
same as [C]). 
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Figure 4: Estimation errors Ei and E r for various values of (ki)/(N — 1), N, 
and r when AJj(id) with At = 1 and D = 100 is given as a dataset. The 
experimental conditions are the same as those for Figure |3] 
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Figure 5: Date when the first cases appeared and the propagating wavefront of 
the spread in the WHO dataset on the cases of SARS. 



trials converge to [A]. It includes 11 links ((ki)/(N — 1) = 0.2). The topology 
[B] is the second most likely (L = —9998). The next 5 trials converge to [B]. 
It include 13 links ({ki)/(N - 1) = 0.24). The topology [C] is the third most 
likely (L = -10012). The next 29 trials converge to [C]. It include 14 links 
{(h) /(N - 1) = 0.25). About 20% of the trials converge to either [A] or [C]. 

The sub-structures common in all of [A], [B], and [C] are a star from HKG 
to CAN, ROC, and SIN, another star from USA to GBR, MAS, and VIE, and 
a link between the centers of these stars (HKG and USA). A triangle between 
USA, VIE, and THI appears in [A] and [C]. The likelihood seems sensitive to the 
topological whereabouts of GER and FRA given these common sub-structures as 
a core of the network. This may happen to make [BJ a tall but narrow peak 
in the landscape of the likelihood, which simulated annealing sometimes fail to 
discover. In addition to these sub-structures, a few remarkable points are seen 
in these topologies. The nodes SIN and ROC are stubs where fcj = 1. The 
role of SIN is not so relevant in spreading the disease despite the fact that the 
number of cases there was more than 100 in the middle of April. The nodes 
CAN and USA have links to distant geographical regions, and USA is a hub 
(hi is the largest). They are relevant intermediate spreader nodes. The links 
around GER are not stable among the three topologies. The number of cases 
in some European countries is too small to draw a reliable conclusion. 

The estimated topologies are not meant to reproduce the trajectories of 
individual patients' movement, but rather demonstrate some demographical in- 
teractions within the macroscopic world-wide transportation behind the SARS 
outbreak. Nevertheless, the sub-structures mentioned above seem to be con- 
sistent with the following publicly known series of events on some individual 
patients' microscopic movements. 

• Two of the index patients in Toronto in Canada, three of the index pa- 
tients in Singapore, and another three of the index patients in the United 
States stayed a hotel in Hong Kong where a Chinese nephrologist, who 
had treated many patients in Guangzhou and become infected, was stay- 
ing in late Februarjjf|. This event implies the links from HKG to CAN, 

2 SARS Expert Committee (Hong Kong), SARS in Hong Kong: from experience to action, 
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[A] [B] [C] 




Figure 6: Estimated topologies from the WHO dataset on the cases of SARS 
in Canada (CAN), France (FRA), United Kingdom (GBR), Germany (GER), 
Hong Kong (HKG), Malaysia (MAS), Taiwan (ROC), Singapore (SIN), Thai- 
land (THI), United States (USA), and Vietnam (VIE) from March 17 through 
April 17. [A]: the most likely topology. [B]: the second most likely topology. 
[C]: the third most likely topology. 

SIN, and USA form a chain of transmission in the early growth phase of 
SARS outbreak. 

• A garment manufacturer from the United States became infected during 
the stay in Hong Kong on the way to Hanoi in Vietnam, showed symptoms 
there, and was evacuated to a hospital in Hong Kong |GreenfeId~2 006 . 
An Italian physician, who treated him at a hospital in Hanoi, showed 
symptoms in Bangkok in Thailand where he would attend a conference in 
early March. These events imply that the interactions among HKG, USA, 
VIE, and THI are present potentially, which could result in another chain 
of transmission. 

The WHO dataset is not of perfectly reliable quality. Particularly, the data 
on mainland China is of poor quality, and can not be used in this study. Even the 
individual number of cases which was reported from the other local governments 
may not be accurate. The number of cases is highly fluctuating and seems noisy. 
Data in a city-level resolution, rather than nation-level, would be necessary 
for accurate estimation when large countries like USA play an important role 
as a spreader. It is surprising that, in spite of these limitations, the method 
reproduces some characteristics of the network over which SARS spread from 
Hong Kong to Southeast Asia and North America. 

http://www.sars-expertcom.gov.hk/english/reports/reports/reports_fullrpt.html (2003). 
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5 Conclusion 



The method presented in this study solves an inverse problem to discover the 
effectively decisive network and reveal the transmission parameters from the 
observation Ii(td) or AJi(td) on the spread of an infectious disease. The find- 
ings with test datasets are that the estimation is particularly successful when 
the topology is sparse and reproduction is slow, and that the estimation from 
A.Ji(td) is more erroneous than that from Ii(td)- The network topology discov- 
ered from a seemingly noisy dataset on the SARS outbreak reproduces some 
characteristic patterns of the spread from Hong Kong to Southeast Asia and 
North America. So far, a great effort has been made to get a complete picture 
of how an infectious disease did and will spread from the found pieces of an 
epidemiological jigsaw puzzle. The method presents new pieces from a view- 
point of macroscopic transportation. These pieces can be put together with 
the pieces found in the conventional field-based medical case studies on the 
individual patients' microscopic movements. 

The method can be extended to apply to a more practical situation. The 
experimental condition in this study is an extreme where nothing but Ii(td) 
or AJi(td) is known and no informative prior knowledge is available. If some 
demographical statistics on the traffic between cities or the findings on the past 
contacts between individual patients are available, the consequent posterior dis- 
tribution enables more comprehensive Bayesian inference. Another extension is 
to employ more complicated but realistic epidemiological compartment models. 
Latent period (infected but not infectious) and hospitalization are relevant for 
some diseases. The dependence of recovery on time ((3 ^ constant) is realistic 
for other diseases. Strictly speaking, the time interval from infection to recovery 
and from a movement to another obeys appropriate probability density functions. 
Analytical treatment of the stochastic process with these effects tends to be con- 
siderably difficult. The estimation may count on such a numerical method as a 
Markov-chain Monte- Carlo sampling. 

In addition to the extensions to the method, it is sometimes critical to gather 
such a larger dataset as a collection of multiple independent time sequences 
starting from different index cases. Less erroneous estimation may be possible 
even for the regions whose population is small or where the number of cases is 
small. Such a dataset is not available for SARS, but possibly for influenza which 
spreads around the world in seasonal epidemics. Understanding the landscape 
of the likelihood functions is essential in identifying the requisites for a dataset, 
a network topology, and a mathematical model of disease transmission to make 
the inverse problem well-posed and stabilize the solution. This remains the 
challenge for the future. 
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A Probability density function 

A generic form of a Langevin equation for multiple time-dependent variables 
Xi(t) is given by eq. ([29|) . The fluctuations £ a (i) are stochastic terms. 

, , t s M-l 

—^-= fH (x (t),--- ,xjv-i(<))+ ^aiMt),--- ,x N ^{t))Ut)- (29) 

Eq. (|2"9")) can be solved by deriving the probability density function p(x,t) 
for probability variables x = (xq, ■ ■ ■ ,xn_i) at time t. The time evolution of 
p(x,t) is given by the Fokker-Planck equation in eq. (j3"0)) . 

i— 

The coefficiets and -By are given by eq. (j3"Tj) and (f3"2"j). 

i4 < (aj)=|i < (x). (31) 



Af-l 

B ij( X ) = J! <^ia(x)cT ja (x). (32) 
a=0 

The mean (the first order moment) of 2;$ at i is given by rrii(t) = (xi)t — 
J Xip(x,t)dx. The time evolution of mi(t) is given by eq. ([33"|) . It is derived 
by multiplying eq. ([30| by x and partial integration under the condition where 
p and dp/dx decay more rapidly than Ai and Bij near the boundary of the 
domain of x. 

^ = <*(*)>*■ (33) 

The covariance (the second order moment) between xi and Xj at t is given by 
Vij(t) = (xiXj)t — rrii(t)mj(t). The time evolution of is given by eq. (|34|) . 

Derivation is similar to that for eq.(|33j). 

dt 



= (B l3 (x)) t + ( Xl A 3 (x)) t + (Ai(x) Xj ) t . (34) 



Higher order moments can be obtained recursively as a solution of the dif- 
ferential equations which include the calculated lower order moments. 



B Moments of Ii and Jj 

Eq. ([35| through (|39|) are the differential equations for the time evolution of the 
first and second order moments of Ii and Ji. The symbols m\ l \t\6), m^(t\8) 
are the row vectors whose i-th element is the mean of Ii, Ji, and v^(t\6), 
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■yP J l(t|0), t>[ JJ ](£|0) are the N x N matrices whose i-th row and j-th column 
element is the covariance between Ii and Ij , Li and Jj , Ji and Jj . The unknown 
network topology and transmission parameters are represented by a symbol 
= { 7 ,a,/3}. 

0) =m^t\9)a\ (35) 
dmW(t\9) {l] 



dt 



am w (t\8). (36) 



dvli y^ = av im m + v m ma T + {B)t , (37) 



d^[ IJ ](t|6>) 
dt 

dt 

Definitions of the N x N matrices a, B, and c which appear in eq. ([35 
through p9|) are given by eq. (|40|) through (|42j). 



au [IJ] (i|0) + a(w (i|0) + c(t)). (38) 



a(«[ IJ I(t|6/) + t;[ IJ l(t|6») T + c(i)). (39) 



= (a-/3-^7 ifc )%+7ji- (40) 



fc=0 



N-l JV-1 



#ij = {(a + /3 + ^ lik)k + X! lkih}kj ~ lijh - T/i-fj- ( 41 ) 



k=0 k=0 



Cl] (t) = 8 l3 mf(t\e)- (42) 
Eq. (|43|) through (|47f are the solutions. .E is a unit matrix. 

mM(t|(9) = J(0)exp(a T i). (43) 

m [J1 (t|0) = I{Q){a(a T )- 1 exp(a T t) - a(a T )- 1 + E}. (44) 

v [II] (<|6/) = [ exp(a(t-t')) (B) t , exp(a T (i - £')) dt'. (45) 

v P J ](t|6>) = / aexp(o(t-0) ) + c(0) dt'. (46) 

Jo 
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tf [JJ1 (t|0) = [ a(v^(t'\e)+v^(t'\8) T + c(t')) dt' 
Jo 



a 2 / exp(a(t' - t")) (v [u] (t"\0) + c(t")) + 
Jo 

{v^{t"\8) + c(t")) exp(a T (i' - t")) dt" + ac(i') dt' . (47) 

C Moments of / and J 

Eq. (|4"51) through (fB"2")) are the differential equations for the time evolution of the 
first and second order moments of / and J. The symbols mfi(t\9), m\^(t\8) 
are the mean of /, J, and v^(t\8), v^(t\8), v^^(t\8) are the variance of /, 
covariance between / and J, variance of J. 

dmM(t|0)_,„, a\ [i] 



dt 



(a-/3)m llJ (t|0). (48) 



dm yV=arnV m . (49) 
d^ n (*|0) =2 (a-/3)i I [ II ](t|0) + (a + /?)m[ I ](^). (50) 



dt 



= ( Q _ fl v M m + a(vM (t\8) + mW (t|<9)). 
dz,^(t|0) =a{2v HJ] m+m [i] {tm 



(51) 
(52) 



Eq. (|55|) through ([5T)l are the solutions. 

m [I] (t|6/) = 1(0) exp(a - /3)t. (53) 

m[ J l(i|0) = I(0)(-% exp(a - fit - -^). (54) 
a — p a — p 

v^(t\8) = I (0)^4 (exp2( a - p)t - exp( a - /3)t). (55) 
a — p 

^ [IJ] = IiOH ^^l ex P 2(a ~/3)t 

,a(a + B) 2aB , , „. . ,_„. 

- ( ? J, 2 + ^expa-/3t}. 56 

(a — p) z a — p 

«[ JJ l(i|0) = J(0)[ " 2 (Q + f exp2(a-/3)f 
(a - jiy 

{ ^F + (^F 0eXp(Q "^~ (a-/3)3 ] - (57) 
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